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ABSTRACT 

Line-of-sight velocity distributions are crucial for unravelling the dynamics 
of hot stellar systems. We present a new formalism based on penalized 



likelihood for deriving such distributions from kinematical data, and evaluate 
^ . the performance of two algorithms that extract N(V) from absorption-line 

spectra and from sets of individual velocities. Both algorithms are superior to 
existing ones in that the solutions are nearly unbiased even when the data are so 
poor that a great deal of smoothing is required. In addition, the discrete- velocity 
algorithm is able to remove a known distribution of measurement errors from 
the estimate of N(V). The formalism is used to recover the velocity distribution 
of stars in five fields near the center of the globular cluster u> Centauri. 



1. Introduction 

The most complete kinematical information obtainable for a distant stellar system is 
the distribution of line-of-sight velocities at every point in the image. Velocity distributions 
are crucial for understanding the dynamical states of slowly-rotating stellar systems like 
elliptical galaxies and globular clusters, since velocity dispersions alone place almost no 
constraints on the form of the potential unless one is willing to make ad hoc assumptions 
about the shape of the velocity ellipsoid ( pejonghe fc Merritt 1992| ). Velocity distributions 
are also useful when searching for kinematically distinct subcomponents (e.g. |Franx fc 
illingworth 1981 ; |Rix et al. 19921) . 



The velocity distribution at point R in the image of a stellar system, A(R, V), can 
be related to the data in different ways depending on the nature of the observations. In 
a system like a globular cluster, for which the data usually consist of individual stellar 
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velocities, the velocity distribution is just the frequency function of stellar velocities defined 
by those stars with apparent positions near to R. Since measured velocities are always in 
error, the observed and true JV(V)'s are related via a convolution integral. In a distant, 
unresolved galaxy, one typically measures the integrated spectrum of many stars along a 
line of sight. The observed spectrum is then a convolution of the velocity distribution of 
these stars with the broadening function of the spectrograph, and the spectrum of a typical 
star. 

With both sorts of data, the goal is to find a function N(V), at some set of points R, 
such that J2iL{Yf, N(V)} - the log likelihood of observing the data Yi given N - is large. 
Maximizing this quantity over the space of all possible functions N(V) is unlikely to yield 
useful results, however, since any N(V) that maximizes the likelihood (assuming it exists, 
which it often will not) is almost certain to be extremely noisy. This is obviously true if 
the data are related to the model via a convolution, since the process of deconvolution 
will amplify the errors in the data. But it is equally true if N(V) is simply the frequency 
function of observed velocities, since the most likely distribution corresponding to an 
observed set of Vs is just a sum of delta functions at each of the measured velocities. One 
is therefore forced to place smoothness constraints on the solution. 

But smoothing always introduces a bias, i.e. a systematic deviation of the solution 
from the true N(V). The nature of the bias is obvious when the smoothing is carried out by 
imposing a rigid functional form on N(V), since the true function will almost certainly be 
different from this assumed form. But even nonparametric smoothing generates a bias since 
it effectively averages the data over some region. Furthermore, because the required degree 
of smoothing increases with the amplitude of the noise in the data, the error from the bias 
goes up as the quality of the data falls. An ideal algorithm for estimating N(V) would 
therefore be one in which the bias introduced by the smoothing was effectively minimized, 
so that the derived N(V) was close to the true function even when the data were so poor 
that a great deal of smoothing was required. 

One way to accomplish this is to make use of prior knowledge about the likely form of 
N(V). Many studies of stellar and galactic systems have shown that N(V) is often close to 
a Gaussian. This fact suggests that we infer N(V) by maximizing a quantity like 

log£ p = 5>{Yi; N(V)} - aP(N), (1) 

i 

the "penalized log likelihood," where the penalty functional P(N) is large for any N(V) 
that is noisy and zero for any N(V) that is Gaussian. A natural choice for such a penalty 
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functional has been suggested by [Silverman ( 1982 



P G {N)= {d/dVf log N(V) dV. (2) 

J — oo L J 

This functional assigns zero penalty to any N(V) = N exp[—(V — Vo) 2 /2a 2 }, i.e. any 
Gaussian velocity distribution, and a large penalty to any N(V) that is rapidly varying. The 
limiting estimate as the smoothing parameter a tends to infinity is the normal distribution 
that best corresponds, in a maximum-likelihood sense, to the data. Thus varying a takes 
one from an estimate of N(V) that is very noisy but that reproduces the data well, to 
the "infinitely smooth" maximum likelihood Gaussian fit to N(V). When the data are 
copious and accurate, the degree of smoothing required, i.e. the value of a, will be small 
and the inferred N(V) will be close to the true function. When the data are poor, a must 
be increased to deal with the noise; however even a very large value of a will yield an 
N(V) that is nearly Gaussian and that is therefore likely to be close to the true velocity 
distribution. Regardless of the quality of the data, there will always be a formally optimal 
choice of a that yields a solution that is closest in some sense to the true N(V) - neither 
too noisy nor too biased. 

Here we apply penalized likelihood methods to the recovery of velocity distributions 
from kinematical data of two sorts: Doppler-broadened spectra, and discrete velocities. 
There are of course a large number of excellent algorithms already in use for extracting 
velocity distributions from absorption-line spectra ([Franx fc Illingworth 1988] ; [Bender 



[T990t |Kix fc White 1992| ; |Kuijken fc Merrifield 1993] |van der Marel fc Franx T993| ; ^aEa 



fe Williams 1994] ). Is another approach really needed? Many of the existing algorithms 



are essentially nonparametric and hence yield unbiased estimates of N(V) when the 
signal-to-noise ratio of the data is large. Their behavior given data with small S/N is more 
variable, however, due to the different ways in which they carry out the smoothing. For 
instance, in algorithms that represent the unknown N(V) via a basis-set expansion, 



K 



N{V) = J2C k N k (V), (3) 



k=l 



the smoothing is accomplished by truncating the expansion after a finite number of terms. 
Adding more terms decreases the bias by giving the algorithm more freedom to match 
the true N(V); however the level of fluctuations in the solution increases as the bias falls 
since the high-order terms will always try to reproduce the noise in the data. The optimal 
estimate for data of a given quality will therefore contain only a limited number of terms, 
and if the data are poor, this number will be small; hence the solution will be biased 
toward the functional form selected for Nx(V). Of course one can choose Ni to be the 
normal distribution (e.g. [van der Marel fc Franx 1993| ) but its mean and variance must be 
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specified, and the optimal choices for both parameters will depend in some complex way on 
the number of terms retained in the expansion and on the level of noise in the data. Worse, 
one does not have complete freedom in this approach to adjust the smoothing to its optimal 
level since the basis set is fixed and its terms are discrete. 

Most of the spectral deconvolution schemes currently in use have continuously- 
adjustable smoothing and so do not suffer from this latter defect. However few of these 
schemes incorporate any prior knowledge about the likely form of N(V) and so the 
solutions they return given noisy data tend to be unphysical, with functional forms that 
are determined primarily by the mechanics of the smoothing. (In fact it is a common 
practice to fall back on parametric methods when the data are poor.) For instance, both 
the Wiener-filtered algorithm of Rix & White (1992) and the kernel-based algorithm of 
Kuijken & Merrifield (1993) yield N(V) ~ constant in the limit of infinite smoothing. Since 
real velocity distributions are unlikely to be well approximated by constant functions, the 
estimates produced by these schemes can be substantially biased. 

The approach advocated here is neither better nor worse than the existing ones when 
the data are of high quality. However it has an advantage when the data are poor, since 
even a large degree of smoothing is likely to bias the solution only slightly. Furthermore 
the amount of smoothing can be adjusted with infinite precision by varying a. In practice 
it may be difficult to find the optimal choice of a given noisy data, but there are bootstrap 
techniques for choosing a that work well in most cases; and one can always elect to be 
guided by physical intuition when deciding how smooth N(V) should be. Most important, 
even a gross overestimate of a will yield no worse than a Gaussian fit to the velocity 
distribution, which is perhaps the best that can be hoped for when the signal-to-noise ratio 
or the number of observed velocities is small. 

(Much of the uncertainty in JV(V)'s derived from absorption line spectra is due to 
systematic errors such as incorrect continuum subtraction, template mismatch, limited 
resolution, etc. These problems afflict every spectral deconvolution scheme and have been 
treated at length by other authors. We have nothing new to say about these systematic 
effects and will simply ignore them - our focus is on the uncertainty in N(V) that is 
generated by noise in the data rather than by systematic errors.) 

Below we evaluate the performance of penalized-likelihood algorithms for estimating 
N(V) from spectral data (§2) and from discrete velocities (§3). The former sort of data 
are routinely obtained for distant galaxies and the latter for globular clusters, clusters of 
galaxies and systems of emission-line objects around galaxies. The same formalism works 
equally well in both cases; essentially all that needs to be changed is the form of the matrix 
that relates the observable quantities to N(V). We also show how the cross-validation 
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score can be used to estimate the optimal degree of smoothing from the data. We then 
(§4) apply the formalism to the recovery of the velocity distribution near the center of the 
globular cluster uj Centauri using a new sample of stellar radial velocities from an imaging 
spectrophotometer. 



2. Spectra 

The optical spectrum at any point in the image of a galaxy is made up of the sum of 
the spectra of its component stars along the line of sight, Doppler shifted according to their 
line-of-sight velocities V. If the galaxy spectrum is /(A) and the spectrum of a single star - 
after convolution with the instrumental response - is T(A), then 

/+oo 
N(V)T(\n\-V/c) dV = NoT, (4) 
-oo 

the convolution of N(V) with the template. We accordingly seek a function N = N(V) 
such that 

- log C p = £[I(A«) - (N o T)i} 2 + aP G (N) (5) 

i 

is minimized, where I(Xi) is the observed spectral intensity at wavelength Aj. 

We first define a discrete grid in velocity space, Vi, V m , where Vj — V\ + (j — 1)5 
and 5 = (V m - V\)/(m - 1). Typically V x « V - 4cr and V\ ps V + 4a, where V is the 
mean velocity of the system and a is the expected velocity dispersion. The number of grid 
points m should be large enough that the numerical solution to the minimization problem 
is essentially independent of m, yet small enough that the minimization — which requires 
a computational time that scales as ~ m 3 — does not take inordinately long. We mostly 
used m = 50 velocity grid points in what follows. A discrete representation of the right 
hand side of Eq. (Q) on this grid is 

m-2 

- (N o T)i} 2 + a5- 5 ]T [- logiVj-i +31ogiV j - 31ogiV i+1 + logiV J+2 ] 2 . (6) 

i j=2 

In the first term, o T is computed by fitting an interpolating spline to the template 
spectrum and assuming a linear dependence of the (unknown) function iV on V between 
the grid points. The convolution can then be represented as a matrix multiplication, i.e. 

(NoT) i n y £ l A ij N j , (7) 



A 



'j 



'j+lij ij Vj—xli,j—l Ji,j—X 
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kj= f J+1 T(\n\ t -V/c)dV } (9) 

JVj 

Jii = f 3+1 V T (In Aj - V/c) dV. (10) 

JVj 

One then varies the m parameters Nj until a minimum is obtained; the use of log N in the 
penalty functional guarantees that the solution will remain everywhere positive without the 
need to impose additional positivity constraints. Minimization was carried out using the 
NAG routine E04JAF and required about one minute on a DEC Alpha 3000/700 machine. 

If the true N(V) is a Gaussian, the optimal estimates will be obtained from this 
algorithm by choosing a to be very large - the algorithm becomes essentially parametric in 
this limit, returning the normal distribution that best approximates the data. But we would 
like to show that the algorithm works well even in cases where iV(V) deviates strongly from 
a Gaussian. The crucial problem, of course - both for this algorithm and any other - lies in 
choosing the correct degree of smoothing. 

Figure 1 shows three estimates of N(V) based on pseudo-data generated from the very 
non-Gaussian velocity distribution 

N(V) = (0.7exp[-(V - V x ) 2 l2o 2 \ + 0.3exp[-(F - V 2 ) 2 /2a 2 }) , (11) 



with V\ = 140km s l , V 2 = —140km s 1 and a = 80 km s 1 . This N(V) was found 
by Merrifield fc Kuijkcn (1994) to provide a good description of the stars in the disk of 



the Sab galaxy NGC 7217. Merrifield & Kuijken's broadening function was convolved 
with a template spectrum from a K0III star, kindly provided by T. Williams; to this 
broadened spectrum was added Gaussian random noise with amplitude such that 
I /a N = S/N = 20. The resultant spectrum was then sampled at 800 points in the wavelength 
range 4800A < A < 5500A and these "data" were given to the minimization routine. Figure 
1 illustrates the famous "bias-variance" tradeoff of solutions to ill-conditioned problems. 
When a is small, the estimated iV(V) matches the true function in an average sense, but 
fluctuates strongly from grid point to grid point. For larger a, the estimate is nearly 
correct, with fluctuations of only a few percent around the true N(V). When a is increased 
still more, the solution is driven toward a single Gaussian with roughly the same mean and 
dispersion as that of the true N(V). All of these estimates for A^(V^) reproduce the input 
spectrum with acceptable accuracy, but N(V) itself is correctly recovered only when a is 
chosen appropriately - a characteristic feature of ill-conditioned problems. Nevertheless an 
appropriate choice of a allows one to recover even this very non-Gaussian N(V) with good 
accuracy. 
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Figure 1 also shows three attempts to recover the broadening function 



N(V) 



1 



0.4 / - ,„ ox 0.6 



exp (-1/ 2 /2a 1 2 ) + — exp (-V 2 jlo. 



01 v '02 



(12) 



with (Ti = 200 km cr 2 = 50 km s -1 . This iV(V) is more peaked than a Gaussian, and is 
designed to represent the velocity distribution in the halo of a spherical galaxy dominated 
by radial orbits ( Pejonghe 1987| ). Once again the nature of the solution depends on a, with 
large values of a generating a Gaussian approximation to the true broadening function. But 
the optimal a produces an estimate of N(V) that is again quite close to the true function, 
and the velocity dispersion derived from the estimated N(V) remains close to the true 
dispersion even when a is far from its optimal value. 

The examples just discussed were based on data with a high signal-to-noise ratio. 
A more stringent test is the recovery of a non-Gaussian N(V) from noisy data. Since a 
larger degree of smoothing will be required, the penalty functional will tend to drive the 
solution away from the true N(V) as the level of the noise increases. Figure 2 shows how 
the average error in the estimated N(V) varies with the signal-to-noise ratio of the data. 
For each value of S/N, 300 random realizations of the same spectrum were generated from 
the Merrifield-Kuijken broadening function and the value of a that minimized the average, 
integrated square deviation between the estimates N and the true function N was found. 
The integrated square error was defined as 



TOT , / N(V) - N(V) 
ISE = — 



dV 

— ; (13) 



JN 2 (V)dV 

the normalizing function in the denominator was added so that a value ISE « 1 corresponds 
to an rms deviation of roughly 100% between N and N. The MISE was then defined as the 
mean value of the ISE over the 300 realizations using this "optimal" value of a. Figure 2 
shows that the MISE falls roughly as a power law with S/N, with a logarithmic slope of 
approximately —1.3. The same information is presented in more detail in Figure 3, which 
shows the average estimate and its 95% variability bands for each value of S/N. This figure 
suggests that the existence of two peaks in N(V) is recoverable for S/N > 5. 

In these examples, the optimal choice of a and its associated MISE could be computed 
since the correct form of N(V) was known. In general one does not know N(V), of course, 
and the choice of a must be based somehow on information contained within the data itself. 
One way of doing this is via the "cross-validation score" ( |Green fc Silverman 1994] , p. 30; 



Wahba 1990| , p. 47). One seeks the value of a that minimizes 

CV(«) = -j:[/(A,)-(iVWoT) i ] 2 , (14) 
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where is the minimizer of Eq. (|]) with the zth data point left out. In effect, the CV 
measures the degree to which the spectral intensities predicted by N(V) are consistent with 
the observed intensities. This is not quite the same as asking how close the estimated N(V) 
is to the true N(V), but it is probably the best that one can do in practice flWahba 198(J| ). 

Figure 4 shows the result of two attempts to recover the optimal a from fake spectra 
generated using the Merrifield-Kuijken broadening function with S/N= 5 and 10. The 
values of a that minimize the CV are tolerably close to the values that actually minimize 
the ISE. More to the point, the ISE of the i\T(V)'s generated using the CV estimates of a 
differ only negligibly from those obtained using the optimal en's. These examples suggest 
that one can indeed hope to recover a useful estimate of the optimal smoothing parameter 
from the data alone. At the very least, such an estimate would provide a starting point 
when searching for the a the produces the physically most appealing estimate of N(V). 



3. Discrete Velocities 

In many stellar systems, information about N(V) is most naturally obtained in the 
form of discrete velocities. If the velocities are measured with negligible error, N(V) is 
simply their frequency function, which can be defined as the function that maximizes the 
penalized log-likelihood 

n 

\ogC p = £logJV(Vi) - aP G (N) (15) 

8=1 

subject to the constraints 

j N(V)dV = 1, N(V) > (16) 

( [Thompson fc Tapia 199Q , p. 102). The penalty functional is needed since, in its absence, 
the optimal estimate would be a set of delta-functions at the measured velocities. 

But the uncertainty in the measured velocities is sometimes comparable to the width 
of N(V). For instance, radial velocities of faint stars in globular clusters may have 
measurement errors of a few km s~ x compared to intrinsic velocity dispersions of ~ 10 km 
s . The observable function is then not N(V) but rather its convolution with the error 
distribution, N o P. Assuming that the errors have a normal distribution, with dispersion 
a at, we have 

NoP= / N(V')exp -{V-V'f/2a 2 dV (17) 



27TCJ at 

and one accordingly seeks the N(V) that maximizes 

n 

log£ p = $>g(iVoP) 4 - a P G (jV) 
i=i 
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subject to the same constraints on N. Following Silverman (1982), we find the solution to 
this constrained optimization problem as the unconstrained maximizer of the functional 

n , 

2 MiV o p)i - aP G {N) -n N{V)dV. (19) 
i=i 

This problem is formally very similar to the deconvolution problem solved above, and a 
discrete representation of flH3| ) on a grid in velocity space is 

J2\og(NoP) l + a6- 5 J2 [-logN j _ 1 +31ogJV J --3bgJV i+ i+logJV i+2 ] 2 -n53e J -iV i , (20) 

i=l j=2 j=l 

with e\ = e m = 5/2, e 3 - = 8,j = 2, ...,m — 1. The convolution of iV with P is again 
represented as a matrix, Eq. ([?[), with 



fVj+i 




/ exp 


2a% I 



3/ 



-(V-Vi 

\j Also jv 



2^ 



dV, (21) 
dV, (22) 



which can be expressed in terms of the error function. 



Noise in the data now comes from two sources: measurement errors, as described by 
<jn] and finite-sample fluctuations due to the limited number n of measured velocities. 
Figure 5 shows how the MISE depends on n for data sets generated from the velocity 
distribution 

N(V) = (exp[-(V - Vl) 2 /2a 2 ] + exp[-(V - V 2 f/2a 2 ]) , (23) 



2vra 

with V\ = 9 km s _1 , Vi = —9 km s~ 1 and a = 8 km s _1 . This flat-topped velocity 
distribution was designed to mimic N(V) near the projected center of a globular cluster 
containing an abundance of nearly-radial orbits. (Examples of such models and their 
velocity distributions may be found in |Merritt 1989| ). 

The two sets of points in Figure 5 correspond to pseudo-data generated with zero 
velocity errors (circles) and with <tjv = 4 km s -1 (squares); the latter value is roughly 
one-third of the intrinsic velocity dispersion. The MISE falls off roughly as n -0 9 for both 
types of data, almost as steep as the n~ l dependence of parametric estimators. Of course 
the mean error is larger, at a given a, for the sample with nonzero <jn', however an increase 
in sample size from n = 200 to n = 300 produces the same decrease in the MISE as a 
reduction in the measurement uncertainties from 4 to km s" 1 . Thus even relatively large 
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measurement errors can be overcome by a modest increase in sample size (assuming, of 
course, that the distribution of errors is well understood). Figure 6 shows the average 
estimates obtained with the optimal smoothing parameters and their 95% variability bands. 
The non-Gaussian nature of N(V) is surprisingly well reproduced even for n = 200, but the 
two peaks only begin to be clearly resolved for n = 1000. 

Various techniques, including a version of the cross-validation score described above, 
can be used to estimate optimal smoothing parameters for data like these. The "unbiased" 
or "least-squares" cross-validation score (Scott 1992, p. 166; Silverman 1986, p. 48) is 
defined as 

r 2 n 

UCV(a) = (No PfdV - - Y(N® o P) { (24) 
J nfr[ 

where is an estimate of N obtained by omitting the ith velocity. The value of a that 
minimizes the UCV is an estimate of the value that minimizes the ISE of N o P. Figure 
7 shows the dependence of the UCV on a for two data sets, with n = 200 and n = 500, 
generated from the velocity distribution of Eq. ([23|) with <jn = 0. The minimum in both 
curves occurs at a value of a close to the value that actually minimizes the ISE. 

The results just presented suggest that increasing the number of measured velocities in 
globular clusters may be a greater priority than reducing measurement errors if the goal is 
to determine N(V), since existing techniques can already extract stellar radial velocities 
with greater precision than the uncertainty — 4 km s -1 adopted here. Because one 
would like to estimate N(V) at several different points in an image, the total number of 
velocities required for a single globular cluster would be in the thousands at least. 



4. Application to ui Centauri 

Fortunately, data sets of this size are now becoming available for a number of stellar 
systems. Here we analyze radial velocities of a new sample of 4200 stars near the center 
of the globular cluster u Centauri. The velocities were measured using the Rutgers 
Fabry-Perot interferometer on the CTIO 1.5m telescope and were kindly made available 
by C. Pryor, who also carried out an analysis of the measurement errors. Figure 8 shows 
the spatial distribution of the observed stars. The effective field of view of the Fabry-Perot 
is about 2.75 arc minutes in radius and is offset by about 1.1 arc minutes E/SE from the 
cluster center as determined by Meylan et al. (1995). The core radius of u Centauri is 
around 2.5 arc minutes ( [Peterson fc King 1975| ) so the observed velocities lie mostly within 
the projected core. 
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The core of a globular cluster is perhaps not a very auspicious place to look for 
non-Gaussian velocity distributions. However u Centauri is fairly young in a collisional- 
relaxation sense, with an estimated central relaxation time of a few billion years ( |Meylan"et 



al. 1995|). Thus its velocity distribution might still be non-Maxwellian. Furthermore there 



is evidence for two, chemically distinct populations in uj Centauri (Norris et al. 1996) which 
may have formed at different epochs and hence may have different kinematics. Finally, 
we note that the theoretical expectation of Maxwellian velocity distributions in globular 
clusters has rarely been tested by direct determination of line-of-sight velocity distributions. 
For this reason alone, it seems worthwhile to estimate N(V) in u Centauri. 

About 20 stars were removed from the original sample because their Fabry-Perot line 
profiles showed contamination by Ha emission. Velocity uncertainties of the remaining stars 
were estimated using standard procedures (e.g. |Gebhardt et al. 1994j ) ; a typical estimated 



error was 3-5 km s _1 . Although the estimated errors were found to correlate with stellar 
magnitude, this fact was ignored in the analysis and was simply set equal to the rms 
estimated uncertainty of all the stars in each subsample, about 4 km s _1 in each case. For 
comparison, the central velocity dispersion of to Centauri is 15-20 km s _1 (Meylan et al. 
1995). 

One would like to estimate N(V) independently at many different points in this 
observed field of view. However Figure 6 suggests that sample sizes less than a few hundred 
are not very useful for detecting departures from normality. The compromise, as illustrated 
in Figure 8, was to identify five, partially overlapping fields of one arc minute radius 
containing about one-half of the observed stars. Two fields lie along the estimated rotation 
axis of the cluster, at a position angle of 30° east from north with their centers displaced 
one arc minute from the cluster center. The three additional fields were situated along a 
perpendicular line. The orientation of the rotation axis was estimated from this sample, 
but is consistent with a determination based on ~ 500 stars with velocities measured by 
CORAVEL ( [Meylan 1996[ ). The offset between the Fabry-Perot field of view and the cluster 



center was used to advantage by centering the fifth field at a distance of 2.5 arc minutes 
from the cluster center along the direction of maximal rotation. 

The number of stars in each of the subsamples, along with their mean velocity and 
velocity dispersion (with estimated errors removed) in km s _1 , are given in Table 1. The 
average number of stars per field was 653 for the four inner fields, with 456 stars in Field 5. 

The velocity distributions in Fields 1 and 5 exhibit the greatest apparent departures 
from normality. Figure 9 shows the dependence of the inferred N(V) in these two fields on 
the value of the smoothing parameter a. Figure 10 displays estimates of N(V) in all five 
fields, for one choice of a, and their estimated 95% confidence bands. The confidence bands 
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were computed in the usual way via the bootstrap (Scott 1992, p. 259); the choice of a is 
justified below. Also shown are the normal distributions with the same mean and variance 
as the inferred N(V)'s. 

None of the recovered velocity distributions is strikingly non-Gaussian, although the 
confidence bands in Fields 1 and 5 do barely exclude the normal distribution at one or 
more points. The inferred N(V) for Field 5 is more centrally peaked than a Gaussian and 
has what might be described as a tail or bump at large positive velocities. It is tempting 
to interpret this curve as resulting from the superposition of two normal distributions with 
different means and variances, though such an interpretation seems extravagant given the 
relatively small amplitude of the deviations. 

The choice of the optimal a for these estimates presented certain difficulties. When 
dealing with random samples drawn from the normal distribution, the value of a that 
minimizes the error in an estimate of N(V) would clearly be very large, since an infinite a 
always returns a normal distribution. Because the velocity distributions in uj Centauri do 
appear to be quite Gaussian, we might expect the optimal a's to be "nearly infinite" and 
hence difficult to estimate from the data alone. In fact the UCV score (Eq. ^4|) was found 
not to have a minimum at any value of a in any of these five sub-samples; instead the 
function UCV(a) was always found to asymptote to a constant value as a was increased. A 
similar result was obtained using the "likelihood cross-validation" score (Silverman 1986, p. 
53). These results certainly do not imply that the distribution of velocities in uj Centauri 
must be exactly Gaussian, since cross-validation is not a precise prescription for determining 
a and often fails to give an extremum. But it appears that these velocity distributions 
are close enough to Gaussian that the cross-validation technique can not find a significant 
difference between the estimates made with finite and infinite a. 

The following alternative scheme was adopted for selecting the optimal a. A crude 
estimate of iV o P can be obtained by replacing each of the measured velocities by a kernel 
function of fixed width. If iV o P is exactly Gaussian, and if the kernel is also Gaussian, the 
optimal window width (i.e. dispersion) for the kernel may be shown to be 

h opt = L06<m -1/6 (25) 

(Silverman 1986, p. 45). Fixed-kernel estimates of N o P in each of the five fields were 
generated from the data using this optimal h, and compared to estimates of N o P using 
the penalized-likelihood algorithm with various values of a. These comparisons could not 
be made precise since the fixed-kernel estimates have a larger bias and do not compensate 
for the velocity measurement errors. Nevertheless it is reasonable to assume that the degree 
of "roughness" in the optimal kernel-based estimates ought to be similar to that in the 
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penalized-likelihood estimates made with the optimal value of a. The plots of N(V) in Fig. 
10 were made using these estimates of the optimal a's. 

We conclude from this analysis that the evidence for non-Gaussian velocity distributions 
in our five fields near the center of u Centauri is marginal at best. We note that the 
strongest deviations appear in the field that is farthest from the center. Perhaps a sample of 
stars even farther from the core - where the relaxation time exceeds the age of the universe 
- would show even larger departures from normality. 

The us Centauri radial velocities analyzed here were obtained by K. Gebhardt, J. 
Hesser, C. Pryor and T. Williams using the Rutgers Fabry-Perot interferometer. C. Pryor 
devoted considerable time to reducing the observations and estimating the measurement 
uncertainties in this sample. Conversations with C. Joseph, M. Merrifield, H.-W. Rix, P. 
Saha, R. van der Marel and T. Williams were very helpful for understanding the ins and 
outs of spectral deconvolution. C. Pryor read parts of the manuscript and made useful 
suggestions for improvements. This work was supported by NSF grant AST 90-16515 and 
NASA grant NAG 5-2803. 



Table 1: Fields in u Centauri 



Field N (V) a 

~~1 655 0. 16T 

2 670 -2.7 16.7 

3 624 0.1 16.8 

4 663 2.2 16.6 

5 456 6.4 14.6 
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Fig. 1.— 

Bias- variance tradeoff in estimates of non-Gaussian iV(V)'s from absorption line 
spectra. Input spectra were generated using two different broadening functions (thin 
curves) with added noise of amplitude S/N=20. (a) - (c): Merrifield-Kuijken broadening 
function, Eq. (|TT|); (d) - (f): broadening function of Eq. ([i2|) . N(V) was then estimated 



(thick curves) using three different values of a. (a), (d): undersmoothed; (b), (e): optimally 
smoothed; (c), (f): oversmoothed. In the limit of large a, i.e. infinite smoothing, the 
estimates tend toward a Gaussian with approximately the same mean and dispersion as the 
true N(V). 

Fig. 2.- 

Dependence of the mean integrated square error of the recovered broadening function on 
the signal-to-noise ratio of the spectrum, for spectra generated from the Merrifield-Kuijken 
broadening function (Eq. (|TT1). For each value of S/N, 300 noise realizations of the same 
spectrum were generated and the value of a that minimized the average square deviation 
between the true and estimated iV(V)'s was found. The ordinate is the MISE of the 
estimates using this optimal a. 

Fig. 3.- 

Average estimates of N(V), and their 95% variance bands, based on 300 noise 
realizations of spectra generated from the Merrifield-Kuijken broadening function (|ll|). As 
the signal-to-noise ratio increases, the average estimate tends toward the true broadening 
function and the variance of the estimates decreases. 

Fig. 4.— 

Dependence of the cross-validation score (CV) on the smoothing parameter a for two 



spectra generated using the Merrifield-Kuijken broadening function (|TT|), with S/N = 10 
and S/N = 5. The value of a at the minimum of the CV curve is an estimate of the value 
of the smoothing parameter that minimizes the integrated square error of the estimated 
N(V). Arrows indicate the values of a that actually minimize the ISE of the broadening 
functions inferred from these two spectra. 

Fig. 5.- 

Dependence of the mean integrated square error of the recovered N(V) on the number 
of velocities n, for pseudo-data generated from the flat-topped velocity distribution of Eq. 
(E3|). Squares: = 4 km s _1 ; circles: = 0. 
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Fig. 6.— 

Average estimates of N(V), and their 95% variance bands, based on 300 samples of 
discrete velocities generated from the flat-topped velocity distribution of Eq. (|23|) . 

Fig. 7.- 

Dependence of the unbiased cross validation score (UCV) on the smoothing parameter 
a for data generated from the velocity distribution of Eq. (|23|), with = 0. The value of a 
at the minimum of the UCV curve is an estimate of the value that minimizes the integrated 
square error of N(V). Arrows indicate the values of a that actually minimize the ISE of 
the velocity distributions inferred from these two data sets. 

Fig. 8.— 

Map of stars with observed radial velocities in the globular cluster u Centauri. The 
origin of the coordinate system is the center of the cluster as defined by Meylan & Mayor 
(1986); major tick marks are separated by one arc minute. The Fabry- Perot field of view is 
offset by approximately 1.1 arc minute from the origin. The dashed line is an estimate of 
the projected rotation axis of the cluster. Velocity distributions were computed for stars in 
the five circular fields shown. 

Fig. 9- 

Penalized- likelihood estimates of N(V) in two fields in u Centauri. The value of the 
smoothing parameter a increases from (a) to (c). Thin lines are normal iV(V)'s with the 
same mean velocity and velocity dispersion as the estimated iV(V)'s. 

Fig. 10.— 

Penalized-likelihood estimates of N(V) in five fields in ui Centauri. Heavy solid lines 
are the estimates; dashed lines are 95% bootstrap confidence bands; thin solid lines are the 
normal distributions with the same mean velocity and velocity dispersion as the estimated 
iV(V)'s. Arrows indicate mean velocities. 
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